-
Notifications
You must be signed in to change notification settings - Fork 88
feat: [2/4] integrate smtforest, avoid ser/de of full account/vault data in database #1394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7980bed to
771f6d7
Compare
771f6d7 to
b2aa54b
Compare
b2aa54b to
2964a93
Compare
Mirko-von-Leipzig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly questions; no real issues found as yet
3bd5451 to
c5a199a
Compare
bobbinth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I reviewed most of the non-test code and left some small comments inline.
Also, my understanding is that in this PR, we are just updating the SMT forest, but actually getting data from it would be done in the next PR, right?
That is correct. Querying and populating partial queries/responses is done in 3/4 |
bobbinth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left some more comments inline - all of them should be pretty straight-forward. Once they are addressed, we should be good to merge.
| /// Retrieves the most recent vault SMT root for an account. | ||
| /// | ||
| /// Returns the latest vault root entry regardless of block number. | ||
| /// Used when applying incremental deltas where we always want the previous state. | ||
| fn get_latest_vault_root(&self, account_id: AccountId) -> Word { | ||
| self.vault_roots | ||
| .range((account_id, BlockNumber::GENESIS)..) | ||
| .take_while(|((id, _), _)| *id == account_id) | ||
| .last() | ||
| .map_or_else(Self::empty_smt_root, |(_, root)| *root) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's mention in the doc comments that if a vault root for the specified account cannot be found, we return a root of an empty tree.
Also, looking at the implementation, I wonder if a better backing structure would be BTreeMap<AccountId, (BlockNumber, Word)>. This would make getting the last vault root much simpler.
Getting root for a specific block number should be pretty simple as well (we could use a binary search or even just a linear scan since we know that the number of entries will be pretty small per account ID).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that works. When we'd get a request for a historical account vault, we need the Word at a specific BlockNumber in order to extract the root. The mapping hence would become BTreeMap<AccountId, Vec<(BlockNumber, Word)>> (we'd need that extra dimension in the value) but I think it'd make it more complicated than needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, you are correct - we'd need BTreeMap<AccountId, Vec<(BlockNumber, Word)>>. I think this still may be preferable (and we could create a newtype to wrap over Vec<(BlockNumber, Word)).
My main concern here is that I'm not sure what the impact of .range((account_id, BlockNumber::GENESIS)..) and .take_while() here would be? Does this mean we need to iterate over most of the entries in vault_roots?
Also, indexing on AccountId only would allow us to use a HashMap which should be quite a bit faster than BTreeMap here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a brief look, and using next_back() is O(log(n)) vs O(n), this is implemented in 3/4
| /// Retrieves the most recent storage map SMT root for an account slot. | ||
| /// | ||
| /// Returns the latest storage root entry regardless of block number. | ||
| /// Used when applying incremental deltas where we always want the previous state. | ||
| fn get_latest_storage_map_root( | ||
| &self, | ||
| account_id: AccountId, | ||
| slot_name: &StorageSlotName, | ||
| ) -> Word { | ||
| self.storage_roots | ||
| .range((account_id, slot_name.clone(), BlockNumber::GENESIS)..) | ||
| .take_while(|((id, name, _), _)| *id == account_id && name == slot_name) | ||
| .last() | ||
| .map_or_else(Self::empty_smt_root, |(_, root)| *root) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as above - though, here the data structure would be BTreeMap<(AccountId, StorageSlotName), (BlockNumber, Word)>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is re performance, I think we can always assume a block number and require the user to give us the latest_block_num - which would return to the original API proposed in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reply as in #1394 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also implemented in #1394 (comment) as next_back() to become much cheaper as well as bounding the upper end of the range
|
I am in favor to squash merge and track the remaining comments in #1498 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part 2/4 of #1185 and lays the ground for #1354
Intent
AccountInfocore changes
vault/storage_mapBLOB entries from the accounts table.SmtForestand integrate intoapply_blockandState::loadsignificant changes in the following files:
crates/store/src/db/schema.rsintroducesaccount_storage_headersand removesstorage(the full serialized account storage) fromaccountstablecrates/store/src/state.rs/fn State::apply_blocknow updates the database, but also the lookup table of roots for theSmtForestand the entries in the forest itself, indirect lookup tablesout of scope
LargeSmtForestfor partial storage maps - [devops] Deploy without nuking db #670GetAccountProofendpoint #1354(Large|)SmtForest- Add cleanup routine for outdated historical storage entries #1355how to review
Take the existing TODOs into consideration, if they make sense. This will be the follow up PR.